Effect Sizes

An effect size is a quantitative answer to your research question.

More About Effect Sizes

Effect Sizes Can Be Simple or Complex

  • If you want to know the degree of neurogenesis in the hippocampus of the adult mouse, then the mean number of new neurons (M) observed would be an effect size for your study. Alternatively, you might choose the median number of neurons (Mdn).

  • If you wanted to know how much stress impacts nuerogenesis, then your effect size of interest would likely be the mean difference in neurogenesis between control and stressed mice: Mdiff =Mstressed - Mcontrol. This effect-size is a contrast, representing the difference between groups. While mean differences are typical, you might also consider the median difference: Mdndiff =Mdnstressed - Mdncontrol.

  • If you wanted to know how much sex influences the effect of stress on neurogenesis, then your effect size of interest would likely be the mean difference on the difference. That is, you’d be interested in the simple effect of stress on males, the simple effect of stress on females, and the difference between those differences:

    • MStress In Males =Mmales_stressed - Mmales_control . This is the simple effect for males

    • MStress In Females =Mfemales_stressed - Mfemales_control . This is the simple effect for females

    • Minteraction aka M⍙⍙ =MStress In Females - MStress In Males. This is the difference between these simple effects (how different the stress effect is in females relative to males). This is a quantitative expression of a 2x2 interaction. You’ll sometimes see the symbol delta-delta or the phrase “difference on the difference” because this effect size represents the difference between two effects (the difference between two differences).

Again, you could be focused on means, but you can also use medians as your measure of what’s typical, finding the difference: Mdninteraction aka Mdn⍙⍙ =MdnStress In Females - MdnStress In Males

Categorical Outcomes Have Effect Sizes, Too

We’ve focused here on an experiment with a quantitative outcome (number of neurons), but we may instead have a categorical outcome (e.g. depressed or not). In this case, our effect size is typically a proportion (Pdepressed = NDepressed / NTotal). From there, we have the same progression from simple to more and more complex contrasts:

  • Depression in a single group, say a random-sample of women living in the USA: PWomen_Depressed NWomen_Depressed / NWomen_Total

  • A simple difference or contrast, say the difference in rates of depression for women who have received an intervention vs. controls: Pdiff =PWomen_Depressed_Treated - PWomen_Depressed_Control. This would express the difference in depression rates in the treated group relative to the control group.

  • A complex contrast, say the degree to which men and women differ in response to a treatment for depression:

    • PTreatment_Effect_Women =PWomen_Depressed_Treated - PWomen_Depressed_Control.. This is the simple effect for females

    • PTreatment_Effect_Men =PMen_Depressed_Treated - PMen_Depressed_Control.. This is the simple effect for males

    • Minteraction aka M⍙⍙ =MTreatment_Effect_Women - MTreatment_Effect_Men. Again, this is a quantitative expression of an interaction, representing the difference in response rate in women relative to controls.

There are Lots of Other Simple and Complex Effect Sizes

An effect sizes is the quantitative answer to a research question. Given the diversity of research questions, it should come as no surprise that there are lots of other effect sizes.

Pearson’s r is an effect size – it answer the question: to what degree are these variables linearly related to each other? And, again, you can have a simple effect size (How much are hippocampal volume and spatial reasoning related) and contrasts that involved differences in effect sizes (How much do men and women differ in the relationship between hippocampal volume and spatial reasoning – this question is about the difference in R values between these groups).

Variance is also an effect size! You might be interested, for example, in diversity in brain volume– in which case the effect size of interest might be the variance in brain volume observed across a set of scans. And, again, simple effects often lead to questions comparing effects (To what extent is diversity in brain size altered in autism? This would be a question about the difference in variance observed between a population with autism and normal controls).

We could go on, but hopefully you get the point: although scientists most typically examine differences in means and medians, there are lots of different effect sizes.

One thing you may have noticed is that “effect size” is a poor turn of phrase – it describes a quantitative outcome of any study, whether it be observational or experimental. If we compare ventricule size in schizophrenics and controls, we are conducting an observational study, and yet we still call the difference observed an effect size.

Effect Sizes Need Expressions of Uncertainty

We determine an effect size from a finite set of data, but wish to draw conclusions about the world at large. There will always be uncertainty in generalizing from sample to population, and therefore every effect size needs an accompanying uncertainty interval - an expression of how wrong the observed effect size might be.

There are a surprising number of options for quantifying uncertainty. First, there are radically different statistical approachs: you could use a frequentist approach (that’s the approach that generates p values), a Bayesian approach, a re-sampling based approach, and more. Because frequentist approaches currently dominate the life and health sciences, we’ll only discuss these approaches within this workshop. For a good resources on effect sizes and uncertainty in Bayesian approaches see (Kruschke and Liddell 2018; Kruschke 2014).

Interpreting Confidence Intervals

Within a frequentist approach, there are some choices. For a given effect size we could report its standard error (typical expected error), margin of error (largest error expected in a certain % of cases), or a confidence interval. In general, it is best practice to report a confidence interval– this expresses a wider range of uncertainty than a standard error, and it can easily be reported even for effect sizes that can have assymetrical expected error (like Pearson’s r). The use of a 95% confidence interval has become quite typical, but the level of confidence requires thought and judgement.

In general, we would report an effect size alongside its confidence interval. So, for example, suppose you measured neurogenesis in adult and stressed mice, finding an average of 3,500 new neurons in control mice and 2,700 new neurons in stressed mice. You would report the mean difference, your key effect size of interest, along with its confidence interval: Mdiff = -800 neurons 95% CI[-1500, -100]. The confidence interval summarizes expected sampling error in your experiment, giving a list of effect sizes that most compatible with your data.

Confidence intervals help quantify expected sampling variation–the normal differences scientists encounter when drawing random samples from a larger population. Sampling error is typically the least of your problems– you can also miss the truth due to bias and/or measurement error. Thus, you should consider confidence intervals optimistic– they express only one source of uncertainty and do so under an idealized set of conditions that are not typically met in the real world. In fact, in political polling, real error (which can be determined after an election) averages about twice as large as expected sampling error! Conditions within a lab may help limit non-sampling error, so perhaps the polling example is a bit extreme. Still, it is wise to take confidence intervals with a grain of salt.

Another thing to remember about confidence intervals is that it is not helpful to make to big of a point about values being outside the confidence interval rather than within. In the example above, we imagined stress effect of neurogenesis: Mdiff = -800 neurons 95% CI[-1500, -100]. Note that confidence interval summarizes a distribution of expected sampling error that stretches out towards infinity in both directions; the 95% CI covers 95% of the area of that distribution. In practice, that means that a reduction of, say -80 neurons is not dramatically less compatible with the data than the upper limit of the confidence interval (-100). So, within reason, consider the boundaries of a confidence interval fairly fuzzy – both because the decline in compatibility is continuous rather than sharp, and because a confidence interval is, as described above, rather optimistic.

More Complex Effect Sizes, More Uncertainty

Research programs often grow in complexity over time as we dig into a phenomenon and probe its inner mechanisms. For example, it was once an open question if there is any neurogenesis in adult mammalian brains. At this stage, researchers were interested in simply characterizing neurogenesis: how many (if any) new neurons are produced per day/week/month?

Once it was established that there is considerable neurogenesis in the adult brains of lab specieis (perhaps including humans), research questions became more complex as researchers began to look at the factors influencing neurogensis and the mechanisms underlying neurogenesis. For example, researchers began investigating the impact of early-life stress on neurogenesis. This called for examining a difference betwen groups (Mstressed - Mcontrol), a more complex effect size.

And, again, once a factor important to a phenomenon is established, research questions often become even more complex. So, for example, researchers who have found a stress effect on neurogenesis might want to know if this effect is mediated by cortisol. They thus plan a factorial study manipulating both stress and cortisol function. They are now interested in estimating the degree to which the stress effect is reduced when cortisol signaling is eliminated. (See, for example (Mirescu, Peters, and Gould 2004)).

Why bring all this up? Because it is important to understand that as effect sizes become more complex, uncertainty increase.

Let’s examine this for means. For a single mean, the standard error is:

So let’s say we have been conducting simple studies and find that N = 10 gives us reasonably short confidence intervals. Next, though, we want to take compare groups, and thus estimate the difference between two means. If we keep the sample size the same (N = 10/group), then our standard error will grow (all else being equal):

That is, assuming everything else stays the same, a difference in means has a standard error 1.4x as big as the standard error for a single group – so we either have to live with that increase in uncertainty or consider a larger sample size.

As you might expect, the problem grows and grows as the effect size becomes more complex. If we cant to compare two simple effects (and interaction, or difference on a difference) then we get:

That is, the expected error for an interaction is 2 times as big as for a single group (assuming sample size and all else stays the same).

It gets even worse when we realize that effect sizes often get smaller as questions get more complex. For example, stress may cut neurogenesis by 1/2. At best blocking cortisol might fully restore neurogenesis–but if it is one of many factors, it might only partially restore neurogenesis, say to 3/4 of typical levels. In that case, then, the difference in simple effects (difference in stress effect between normal and cortisol-blocked animals) will be only about 1/4 of typical neurogenesis, a smaller effect that would typically require more samples to study.

So - as research questions become more complex, we should expect more uncertainty over smaller effects! That means developing a fruitful line of reserarch is going to be hard. We need either lots of resources, or we need effect sizes so big that even as our questions become more complex we can study them with reasonable sample sizes.

From this discussion there are some important lessons to keep in mind when sample-size planning:

  • More complex questions generally require much bigger samples.

  • It is important to make a sample size for your most complex research question. The sample-size needed for a simple difference in means will not be adequate for comparing effects across contexts!

  • Fruitful science is built on typically built on developing assays with large initial effects (e.g. fear conditioning, LTP) where complex studies can still be achieved with reasonable sample sizes.

Obtaining Effect Sizes and Confidence Intervals

For researchers used to p values and hypothesis tests, obtaining effect sizes and confidence intervals can initially feel like a challenge. The good news, though, is that for every p value there is always a corresponding effect size and uncertainty interval. In fact, all a hypothesis test with a p value is doing is checking to see if the null hypothesis is within the confidence interval. So, although it can feel a bit daunting at first, you can easily leverage what you know about hypothesis testing to obtaining effect sizes and confidence intervals.

Two-Groups With a Quantitative Outcome

Let’s start with a simple two-group design with a continuous variable. Sticking with my obsession with neurogenesis, let’s imagine you have measured adult neurogenesis in adult mice raised under either stressful or normal conditions. You find a mean of 3,500 neurons in the 6 control animals with a standard deviation of 500. In the 6 animals in the stressed group you find a mean of 2,700 neurons with a standard deviation of 550.

You would often reach for a t-test for this analysis. But we can also compute the mean difference between the groups and its confidence interval. Here’s how we would proceed with the statpsych pacakge for R:

if (!require("statpsych")) install.packages("statpsych")
Loading required package: statpsych
statpsych::ci.mean2(
  alpha = 1 - 0.95,     # For 95% CI, we specify alpha = 0.05
  m1 = 2700,
  m2 = 3500,        
  sd1 = 550,
  sd2 = 500,
  n1 = 6,
  n2 = 6
)
                             Estimate       SE         t        df          p
Equal Variances Assumed:         -800 303.4524 -2.636328 10.000000 0.02489034
Equal Variances Not Assumed:     -800 303.4524 -2.636328  9.910515 0.02506589
                                    LL        UL
Equal Variances Assumed:     -1476.134 -123.8660
Equal Variances Not Assumed: -1476.962 -123.0376

You can see the effect size (-800) both for when we assume equal variance (as in a standard t-test) and for when we drop that assumption (as is done in a Welch’s t-test). There are good arguments that we should always prefer not to assume equal variance (when variances actually are equal, both approaches yield about the same results, when variances are not equal we really need to analyze it that way… so why not just default to always avoiding the assumption?). So, unless you have specific reasons or pre-registered your analyses, it is a good practice to use the estimate without equal variance assumed.

From this output, we see Mdiff = -800 95% CI[-1476, -123]. Again, we should treat the entire CI (and even values beyond it) as reasonable/compatible with the data (at least in the absence of other evidence).

Note that statpsych also emits the corresponding t-test, which we could also report. The t-test and the confidence interval have an exact correspondance: p will always be less than .05 when the 95% CI does not include 0 (as is the case here); p will always be >= .05 when the 95% CI does include 0. We could make the same statements about the p boundary of .01 and a 99% CI, and .1 with a 90% CI, etc. What this means, overall, is that the test and the estimate are from the same model, same data, and will always have corresponding results: from the confidence interval you know if the finding is statistically significant; from the p value you know if 0 is in the confidence interval. Because the confidence interval tells you about significance and more there are many who believe we could get by with only reporting the effect size and confidence interval.

Two Groups with a Quantitative Outcome: Compare Medians Rather than Means

Neurogenesis experiments yield neuron counts, and counts are typically skewed, violating the assumptions for a t-test. In these cases, researchers often reach for a non-parametric test. We can do that, too, comparing the medians between both groups (there’s even a good argument that we should almost always do this… but set that aside for now).

To compare medians with statpsych we need the raw data. In the example below I’ve generated fake raw data for a typical neurogenesis experiment and then used statpsych to estimate the median difference betwen groups:

if (!require("statpsych")) install.packages("statpsych")

# Fake data!
control_data <- c( 3676, 2796, 3322, 3257, 2976, 2895)
stressed_data <- c(2734, 2094, 3766, 2884, 2047, 2778)

statpsych::ci.median2(
  alpha = 1 - 0.95,
  y1 = stressed_data,
  y2 = control_data
)
 Median1 Median2 Median1-Median2      SE        LL       UL
    2756  3116.5          -360.5 448.298 -1239.148 518.1479

Here we see obtain a median difference: -361 neurson 95% CI[-1239, 518]. In this fake data, the effect of stress is more ambiguous – the CI is compatible with large decreases in neurogenesis, but also with small declines, no change at all, and even a modest increase. It would take more data to more clearly determine the typical difference. Fortunately, effect sizes can be synthesized through meta-analysis, lettings us use multiple small studies to narrow in on the truth (as long as we can get an unbiased collection of studies to synthesize).

Comparison of median differences roughly corresponds to a Mann-Whitney U, but not perfectly, as the mean difference does to the t-test.

Two Groups with a Quantitative Outcome: Express Effect Sizes in Different Ways

Effect sizes are quantitative. It is not surprising that quantities can be re-expressed in different units (e.g. length can be expressed in feet, inches, meters, etc.). The same is true for effect sizes: we normally express them in the same units as measured (number of neurons, in our example), but can re-express them in different units.

For example, we can express a mean difference as a mean ratio– quantifying the ratio between the control and treated groups. Ditto for median differences. Here’s the code for median differences in statpsych (using the same fake data as in the difference of medians example above):

if (!require("statpsych")) install.packages("statpsych")

# Fake data!
control_data <- c( 3676, 2796, 3322, 3257, 2976, 2895)
stressed_data <- c(2734, 2094, 3766, 2884, 2047, 2778)

statpsych::ci.ratio.median2(
  alpha = 1 - 0.95,
  y1 = stressed_data,
  y2 = control_data
)
 Median1 Median2 Median1/Median2        LL       UL
    2756  3116.5       0.8843254 0.6524854 1.198542

We see that stressed animals had 88% of the neurogenesis of controls, 95% CI[65%, 120%].

Note that even if we change the units, our interpration shouldn’t change (someone who is unusually tall in meters should still be considered unusually tall measured in feet). So, just as we obseved ambiguous results when thinking about the difference in medians, we have the same reaction to the ratio: this data is compatible with large reductions in neurogenesis, small reductions, no change, and even a modest increase.

One of the most popular transformations is to express mean differences as standardized mean differences, dividing them by the standard deviation to put them in standard deviation units (Cohen’s d). Here’s the code in statpsych for the mean difference example above:

if (!require("statpsych")) install.packages("statpsych")

statpsych::ci.stdmean2(
  alpha = 1 - 0.95,     # For 95% CI, we specify alpha = 0.05
  m1 = 2700,
  m2 = 3500,        
  sd1 = 550,
  sd2 = 500,
  n1 = 6,
  n2 = 6
)
                          Estimate adj Estimate        SE        LL          UL
Unweighted standardizer: -1.522085    -1.405001 0.7189458 -2.931193 -0.11297695
Weighted standardizer:   -1.522085    -1.405001 0.6702018 -2.835656 -0.20851341
Group 1 standardizer:    -1.454545    -1.224880 0.7595127 -2.943163  0.03407204
Group 2 standardizer:    -1.600000    -1.347368 0.8354639 -3.237479  0.03747924

Note the output is a bit more complex– that’s because what seems like the simple instruction to “divide by the standard deviation” is a bit complex. Should we average just average the two groups standard deviations (top line)? Or should we pool the standard deviations based on sample size (second line)? Or should we use one of the groups as our standardized (lines 3 and 4)? Given this complexity (and others), it is usually best to stick with raw-score effect sizes. When you do want to use/interpret Cohen’s d, the average sd (top line) corresponds to the assumptions in a Welch’s t-test, and so is probably the best default option.

In this example, we see that stress reduced neurogenesis: davg = -1.5 95% CI[-2.9, -0.11]. Just as with the raw-score, we see this is a very large effect, but with considerable uncertainty–the daa are compatible with a huge decline (3 standard deviations, almost no overlap in distributions between the groups), modest declines, and even fairly subtle declines (just over a tenth of a standard deviation). Note the 95% CI does exclude 0, so we know this we could reject the null of exactly 0 at a stringency of .05 – but that doesn’t mean that we’ve demonstrated an effect that would be easy to study further!

Complex Designs; Effect Sizes and Confidence Intervals for An Interaction

As research questions become more complex, we turn to factorial designs, in which more than one independent variable is manipulated. Current norms are to analyze factorial designs with an ANOVA, generating F tests for the main effects and interactions. Each F test has an effect size associated with it known as eta-squared, the percentage of variance accounted for. More often than not, however, the researcher’s main research questions are about magnitude of mean or median differences across cells. Specifically, complex designs are usually with the goal of knowing how much one independent variable influences the effect of the other. We answer this question with the difference on the difference, the difference between simple effects. It sounds a bit complicated, but it is simple in practice.

Continuing with our neurogenesis example, imagine you completed a factorial study to determine the role of cortisol in mediating the effects of stress on neurogenesis. You will raise mice under stressful conditions or normal conditions. In addition, 1/2 the mice in each rearing condition will have their adrenal glands removed at a young age; the other 1/2 will undergo a sham surgery. You’ll run 6 animals per cell in this 2x2 factorial design.

Below is some fake data and how we can estimate effect sizes for mean differences with statpsych:

if (!require("statpsych")) install.packages("statpsych")

# Fake data!
control_sham_data <- c( 3676, 2796, 3322, 3257, 2976, 2895)
stressed_sham_data <- c(2734, 2094, 3766, 2884, 2047, 2778)
control_adrenelectomy_data <- c(3522, 2050, 2970, 2626, 2697, 3276)
stressed_adrenelectomy_data <- c(2990, 4198, 3275, 2892, 3681, 2865)

# Print cell means
mean(control_sham_data)
[1] 3153.667
mean(stressed_sham_data)
[1] 2717.167
mean(control_adrenelectomy_data)
[1] 2856.833
mean(stressed_adrenelectomy_data)
[1] 3316.833
# Effect sizes
statpsych::ci.2x2.mean.bs(
  alpha = 1 - 0.95,
  y11 = stressed_sham_data,
  y12 = stressed_adrenelectomy_data,
  y21 = control_sham_data,
  y22 = control_adrenelectomy_data
)
          Estimate       SE           t        df          p         LL
AB:      -896.5000 419.1506 -2.13884916 17.392502 0.04691144 -1779.3127
A:         11.7500 209.5753  0.05606576 17.392502 0.95592847  -429.6564
B:       -151.4167 209.5753 -0.72249283 17.392502 0.47959455  -592.8230
A at b1: -436.5000 289.1121 -1.50979500  7.543165 0.17179077 -1110.2877
A at b2:  460.0000 303.4822  1.51573956  9.997743 0.16054463  -216.2212
B at a1: -599.6667 335.2416 -1.78875977  9.724923 0.10478869 -1349.5053
B at a2:  296.8333 251.5956  1.17980342  8.420328 0.27033500  -278.3444
                 UL
AB:       -13.68725
A:        453.15637
B:        289.98971
A at b1:  237.28765
A at b2: 1136.22121
B at a1:  150.17201
B at a2:  872.01109

This is a lot of output to digest. We’ve defined the manipulation of cortisol as variable B and the manipulation of stress as Variable C (see documentaiton from statpsych). Let’s focus on the simple effects of stress (variable A):

  • For sham operated animals (b1), there is a large but uncertain effect of stress: Mstress_with_cortisoal = -437 95% CI [-1110, 237]

  • For adrenelectomized animals (b2), animals in the stressed condition had higher levels of neurogenesis than controls: Mstress_effect_without_cortisol = 460 95% CI[-216, 1136]

  • Thus, adrenelectomy transformed the effect of stress, turning it from a -437 neuron decrement to a 460 increase, a large though somewhat uncertain interaction: M⍙⍙ = -897 95% CI [-1779, -14]. Note that this interaction effect size has a p value – this will correspond to the p value for the test of this 2x2 interaction in an ANOVA.

We’ve focused here on comparing means – but we could also do the same analysis of a complex design while comparing medians (see statpsych::ci.2x2.median.bs). And, for categorical outcomes we can compare proportions (see statpsych::ci.2x2.prop.bs). And while this example uses a fully-between subjects design, we can obtain the same effect sizes estimates for within-subjects and mixed designs (e.g. see statpsych::ci.2x2.mean.ws and statpsych::ci.2x2.mean.mixed).

Lots of Other Effect Sizes

The statpsych package can help you obtain effect sizes for lots of additional research designs. Here’s a sampler:

ci.2x2.mean.bs() Computes tests and confidence intervals of effects in a 2x2 between-subjects design for means
ci.2x2.mean.mixed() Computes tests and confidence intervals of effects in a 2x2 mixed design for means
ci.2x2.mean.ws() Computes tests and confidence intervals of effects in a 2x2 within-subjects design for means
ci.2x2.median.bs() Computes tests and confidence intervals of effects in a 2x2 between-subjects design for medians
ci.2x2.median.mixed() Computes confidence intervals of effects in a 2x2 mixed design for medians
ci.2x2.median.ws() Computes confidence intervals of effects in a 2x2 within-subjects design for medians
ci.2x2.prop.bs() Computes tests and confidence intervals of effects in a 2x2 between- subjects design for proportions
ci.2x2.prop.mixed() Computes tests and confidence intervals of effects in a 2x2 mixed factorial design for proportions
ci.2x2.stdmean.bs() Computes confidence intervals of standardized effects in a 2x2 between-subjects design
ci.2x2.stdmean.mixed() Computes confidence intervals of standardized effects in a 2x2 mixed design
ci.2x2.stdmean.ws() Computes confidence intervals of standardized effects in a 2x2 within-subjects design
ci.agree.3rater() Computes confidence intervals for a 3-rater design with dichotomous ratings
ci.agree() Confidence interval for a G-index of agreement
ci.agree2() Confidence interval for G-index difference in a 2-group design
ci.bayes.normal() Bayesian credible interval for a normal prior distribution
ci.bayes.prop() Bayesian credible interval for a proportion
ci.biphi() Confidence interval for a biserial-phi correlation
ci.bscor() Confidence interval for a biserial correlation
ci.cod() Confidence interval for a coefficient of dispersion
ci.condslope.log() Confidence intervals for conditional (simple) slopes in a logistic model
ci.condslope() Confidence intervals for conditional (simple) slopes in a linear model
ci.cor.dep() Confidence interval for a difference in dependent Pearson correlations
ci.cor() Confidence interval for a Pearson or partial correlation
ci.cor2.gen() Confidence interval for a 2-group correlation difference
ci.cor2() Confidence interval for a 2-group Pearson correlation difference
ci.cqv() Confidence interval for a coefficient of quartile variation
ci.cramer() Confidence interval for Cramer’s V
ci.cronbach() Confidence interval for a Cronbach reliability
ci.cronbach2() Confidence interval for a difference in Cronbach reliabilities in a 2-group design
ci.cv() Confidence interval for a coefficient of variation
ci.etasqr() Confidence interval for eta-squared
ci.fisher() Fisher confidence interval
ci.indirect() Confidence interval for an indirect effect
ci.kappa() Confidence interval for two kappa reliability coefficients
ci.lc.gen.bs() Confidence interval for a linear contrast of parameters in a between-subjects design
ci.lc.glm() Confidence interval for a linear contrast of general linear model parameters
ci.lc.mean.bs() Confidence interval for a linear contrast of means in a between-subjects design
ci.lc.median.bs() Confidence interval for a linear contrast of medians in a between-subjects design
ci.lc.prop.bs() Confidence interval for a linear contrast of proportions in a between- subjects design
ci.lc.reg() Confidence interval for a linear contrast of regression coefficients in multiple group regression model
ci.lc.stdmean.bs() Confidence interval for a standardized linear contrast of means in a between-subjects design
ci.lc.stdmean.ws() Confidence interval for a standardized linear contrast of means in a within-subjects design
ci.mad() Confidence interval for a mean absolute deviation
ci.mann() Confidence interval for a Mann-Whitney parameter
ci.mape() Confidence interval for a mean absolute prediction error
ci.mean.fpc() Confidence interval for a mean with a finite population correction
ci.mean.ps() Confidence interval for a paired-samples mean difference
ci.mean() ci.mean1() Confidence interval for a mean
ci.mean2() Confidence interval for a 2-group mean difference
ci.median.ps() Confidence interval for a paired-samples median difference
ci.median() ci.median1() Confidence interval for a median
ci.median2() Confidence interval for a 2-group median difference
ci.oddsratio() Confidence interval for an odds ratio
ci.pairs.mult() Confidence intervals for pairwise proportion differences of a multinomial variable
ci.pairs.prop.bs() Bonferroni confidence intervals for all pairwise proportion differences in a between-subjects design
ci.pbcor() Confidence intervals for point-biserial correlations
ci.phi() Confidence interval for a phi correlation
ci.poisson() Confidence interval for a Poisson rate
ci.popsize() Confidence interval for an unknown population size
ci.prop.fpc() Confidence interval for a proportion with a finite population correction
ci.prop.inv() Confidence interval for a proportion using inverse sampling
ci.prop.ps() Confidence interval for a paired-samples proportion difference
ci.prop() ci.prop1() Confidence intervals for a proportion
ci.prop2.inv() Confidence interval for a 2-group proportion difference using inverse sampling
ci.prop2() Confidence interval for a 2-group proportion difference
ci.pv() Confidence intervals for positive and negative predictive values with retrospective sampling
ci.random.anova() Confidence intervals for parameters of one-way random effects ANOVA
ci.ratio.cod2() Confidence interval for a ratio of dispersion coefficients in a 2-group design
ci.ratio.cv2() Confidence interval for a ratio of coefficients of variation in a 2-group design
ci.ratio.mad.ps() Confidence interval for a paired-sample MAD ratio
ci.ratio.mad2() Confidence interval for a 2-group ratio of mean absolute deviations
ci.ratio.mape2() Confidence interval for a ratio of mean absolute prediction errors in a 2-group design
ci.ratio.mean.ps() Confidence interval for a paired-samples mean ratio
ci.ratio.mean2() Confidence interval for a 2-group mean ratio
ci.ratio.median.ps() Confidence interval for a paired-samples median ratio
ci.ratio.median2() Confidence interval for a 2-group median ratio
ci.ratio.poisson2() Confidence interval for a ratio of Poisson rates in a 2-group design
ci.ratio.prop.ps() Confidence interval for a paired-samples proportion ratio
ci.ratio.prop2() Confidence interval for a 2-group proportion ratio
ci.ratio.sd2() Confidence interval for a 2-group ratio of standard deviations
ci.rel2() Confidence interval for a 2-group reliability difference
ci.reliability() Confidence interval for a reliability coefficient
ci.rsqr() Confidence interval for squared multiple correlation
ci.sign() Confidence interval for the parameter of the one-sample sign test
ci.slope.mean.bs() Confidence interval for the slope of means in a one-factor experimental design with a quantitative between-subjects factor
ci.slope.prop.bs() Confidence interval for a slope of a proportion in a single-factor experimental design with a quantitative between-subjects factor
ci.spcor() Confidence interval for a semipartial correlation
ci.spear() Confidence interval for a Spearman correlation
ci.spear2() Confidence interval for a 2-group Spearman correlation difference
ci.stdmean.ps() Confidence intervals for a paired-samples standardized mean difference
ci.stdmean() ci.stdmean1() Confidence interval for a standardized mean
ci.stdmean.strat() Confidence intervals for a 2-group standardized mean difference with stratified sampling
ci.stdmean2() Confidence intervals for a 2-group standardized mean difference
ci.tetra() Confidence interval for a tetrachoric correlation
ci.theil() Theil-Sen estimate and confidence interval for slope
ci.tukey() Tukey-Kramer confidence intervals for all pairwise mean differences in a between-subjects design
ci.var.upper() Upper confidence limit of a variance
ci.yule() Confidence intervals for generalized Yule coefficients

References

Kruschke, John K. 2014. Doing Bayesian Data Analysis: A Tutorial with r, JAGS, and Stan, Second Edition.
Kruschke, John K., and Torrin M. Liddell. 2018. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychonomic Bulletin & Review 25 (1): 178–206. https://doi.org/10.3758/s13423-016-1221-4.
Mirescu, Christian, Jennifer D. Peters, and Elizabeth Gould. 2004. “Early Life Experience Alters Response of Adult Neurogenesis to Stress.” Nature Neuroscience 7 (8): 841–46. https://doi.org/10.1038/nn1290.